Part I Data Cleaning

The csv file “Samples Report of 20180710-Lisa-fullsetDIA.csv” was used. The original data has 224 variables. The variable “DECOY_sp|P04114|APOB_HUMAN” was removed.

The mean of count for responders and non-responders before and after treatment.

The mean of intensity for responders and non-responders before and after treatment.

Below is a barplot of number of proteins with missing values. There are 9 variables with 1 missing values, 5 with 2 missing values, and there are 81 proteins that none of the samples has a value.

Proteins with morer than 5 missing values were removed. For the rest proteins, missing values were filled up using 1/2 of the lowest value.

Below is the barplot of the sum of each sample’s count. This shows that the total amount of peptides are relatively equivalent between samples.

Part 2, statistic analysis

## Coefficients not estimable: Subject123
Linear model was performed on the log-transformed intensity data, using the formular below:

Responding + Timepoint + Responding * Timepoint + Subject + 1

The Responding * Timepoint coefficient was tested.

Below is the distribution of the p values

Vocano plot. Only one protein, Anthrex toxin receptor 1, has a p value < 0.05. Not a typical HDL protein.

Below is the statistic result for each protein.

Boxplot of Anthrax toxin receptor 1